Notes on using Determinantal Point Processes for Clustering with Applications to Text Clustering

نویسندگان

  • Apoorv Agarwal
  • Anna Choromanska
  • Krzysztof Choromanski
چکیده

In this paper, we compare three initialization schemes for the KMEANS clustering algorithm: 1) random initialization (KMEANSRAND), 2) KMEANS++, and 3) KMEANSD++. Both KMEANSRAND and KMEANS++ have a major that the value of k needs to be set by the user of the algorithms. (Kang 2013) recently proposed a novel use of determinantal point processes for sampling the initial centroids for the KMEANS algorithm (we call it KMEANSD++). They, however, do not provide any evaluation establishing that KMEANSD++ is better than other algorithms. In this paper, we show that the performance of KMEANSD++ is comparable to KMEANS++ (both of which are better than KMEANSRAND) with KMEANSD++ having an additional that it can automatically approximate the value of k.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A robust wavelet based profile monitoring and change point detection using S-estimator and clustering

Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Natural scene text localization using edge color signature

Localizing text regions in images taken from natural scenes is one of the challenging problems dueto variations in font, size, color and orientation of text. In this paper, we introduce a new concept socalled Edge Color Signature for localizing text regions in an image. This method is able to localizeboth Farsi and English texts. In the proposed method rst a pyramid using diff...

متن کامل

Analyse Power Consumption by Mobile Applications Using Fuzzy Clustering Approach

With the advancements in mobile technology and its utilization in every facet of life, mobile popularity has enhanced exponentially. The biggest constraint in the utility of mobile devices is that they are powered with batteries. Optimizing mobile’s size and weight is always the choice of designer, which led limited size and capacity of battery used in mobile phone. In this paper analysis of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1410.6975  شماره 

صفحات  -

تاریخ انتشار 2014